Biswas et al 2013 Supplementary File 1 Input files used: 1. CRT prediction from Streptococcus thermophilus DGCC7710 GenBank:DGCC7710 ORGANISM: gi|134103876|gb|EF434469.1| Streptococcus thermophilus strain DGCC7710 CRISPR1 locus genomic sequence Bases: 2225 CRISPR 1 Range: 38 - 2184 POSITION REPEAT SPACER -------- ------------------------------------ ----------------------------- 38 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC TGTTTGACAGCAAATCAAGATTCGAATTGT [ 36, 30 ] 104 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC AATGACGAGGAGCTATTGGCACAACTTACA [ 36, 30 ] 170 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC CGATTTGACAATCTGCTGACCACTGTTATC [ 36, 30 ] 236 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC ACACTTGGCAGGCTTATTACTCAACAGCGA [ 36, 30 ] 302 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC CTGTTCCTTGTTCTTTTGTTGTATCTTTTC [ 36, 30 ] 368 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC TTCATTCTTCCGTTTTTGTTTGCGAATCCT [ 36, 30 ] 434 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC GCTGGCGAGGAAACGAACAAGGCCTCAACA [ 36, 30 ] 500 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC CATAGAGTGGAAAACTAGAAACAGATTCAA [ 36, 30 ] 566 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC ATAATGCCGTTGAATTACACGGCAAGGTCA [ 36, 30 ] 632 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC GAGCGAGCTCGAAATAATCTTAATTACAAG [ 36, 30 ] 698 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC GTTCGCTAGCGTCATGTGGTAACGTATTTA [ 36, 30 ] 764 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC GGCGTCCCAATCCTGATTAATACTTACTCG [ 36, 30 ] 830 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC AACACAGCAAGACAAGAGGATGATGCTATG [ 36, 30 ] 896 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC CGACACAAGAACGTATGCAAGAGTTCAAG [ 36, 29 ] 961 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC ACAATTCTTCATCCGGTAACTGCTCAAGTG [ 36, 30 ] 1027 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC AATTAAGGGCATAGAAAGGGAGACAACATG [ 36, 30 ] 1093 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC CGATATTTAAAATCATTTTCATAACTTCAT [ 36, 30 ] 1159 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC GCAGTATCAGCAAGCAAGCTGTTAGTTACT [ 36, 30 ] 1225 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC ATAAACTATGAAATTTTATAATTTTTAAGA [ 36, 30 ] 1291 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC AATAATTTATGGTATAGCTTAATATCATTG [ 36, 30 ] 1357 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC TGCATCGAGCACGTTCGAGTTTACCGTTTC [ 36, 30 ] 1423 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC TCTATATCGAGGTCAACTAACAATTATGCT [ 36, 30 ] 1489 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC AATCGTTCAAATTCTGTTTTAGGTACATTT [ 36, 30 ] 1555 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC AATCAATACGACAAGAGTTAAAATGGTCTT [ 36, 30 ] 1621 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC GCTTAGCTGTCCAATCCACGAACGTGGATG [ 36, 30 ] 1687 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC CAACCAACGGTAACAGCTACTTTTTACAGT [ 36, 30 ] 1753 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC ATAACTGAAGGATAGGAGCTTGTAAAGTCT [ 36, 30 ] 1819 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC TAATGCTACATCTCAAAGGATGATCCCAGA [ 36, 30 ] 1885 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC AAGTAGTTGATGACCTCTACAATGGTTTAT [ 36, 30 ] 1951 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC ACCTAGAAGCATTTGAGCGTATATTGATTG [ 36, 30 ] 2017 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC AATTTTGCCCCTTCTTTGCCCCTTGACTAG [ 36, 30 ] 2083 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC ACCATTAGCAATCATTTGTGCCCATTGAGT [ 36, 30 ] 2149 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAGT -------- ------------------------------------ ----------------------------- Repeats: 33 Average Length: 36 Average Length: 29 Time to find repeats: 10 ms 2. CRISPRFinder prediction from GenBank:DGCC7710 ######################################## # Program: Crispr Finder Program # Author: Ibtissem GRISSA # Rundate (GMT): 10/12/2012 3:36:50 # Report_file: /var/www/html/CRISPR/tmp/crisprfinder/139.80.123.3_Dec_10_2012_03_35_21/tmp_1/tmp_1_Crispr_1 ######################################## #======================================= # # Sequence: tmp_1 # Description: Streptococcus thermophilus strain DGCC7710 CRISPR1 locus genomic sequence # Length: 2225 # Id: gi|134103876|gb|EF434469.1| # #========================================================================= # Crispr Rank in the sequence: 1 # Crispr_begin_position: 38 Crispr_end_position: 2184 # DR: GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC DR_length: 36 Number_of_spacers: 32 #========================================================================= Spacer_begin_position Spacer_length Spacer_sequence 74 30 TGTTTGACAGCAAATCAAGATTCGAATTGT 140 30 AATGACGAGGAGCTATTGGCACAACTTACA 206 30 CGATTTGACAATCTGCTGACCACTGTTATC 272 30 ACACTTGGCAGGCTTATTACTCAACAGCGA 338 30 CTGTTCCTTGTTCTTTTGTTGTATCTTTTC 404 30 TTCATTCTTCCGTTTTTGTTTGCGAATCCT 470 30 GCTGGCGAGGAAACGAACAAGGCCTCAACA 536 30 CATAGAGTGGAAAACTAGAAACAGATTCAA 602 30 ATAATGCCGTTGAATTACACGGCAAGGTCA 668 30 GAGCGAGCTCGAAATAATCTTAATTACAAG 734 30 GTTCGCTAGCGTCATGTGGTAACGTATTTA 800 30 GGCGTCCCAATCCTGATTAATACTTACTCG 866 30 AACACAGCAAGACAAGAGGATGATGCTATG 932 29 CGACACAAGAACGTATGCAAGAGTTCAAG 997 30 ACAATTCTTCATCCGGTAACTGCTCAAGTG 1063 30 AATTAAGGGCATAGAAAGGGAGACAACATG 1129 30 CGATATTTAAAATCATTTTCATAACTTCAT 1195 30 GCAGTATCAGCAAGCAAGCTGTTAGTTACT 1261 30 ATAAACTATGAAATTTTATAATTTTTAAGA 1327 30 AATAATTTATGGTATAGCTTAATATCATTG 1393 30 TGCATCGAGCACGTTCGAGTTTACCGTTTC 1459 30 TCTATATCGAGGTCAACTAACAATTATGCT 1525 30 AATCGTTCAAATTCTGTTTTAGGTACATTT 1591 30 AATCAATACGACAAGAGTTAAAATGGTCTT 1657 30 GCTTAGCTGTCCAATCCACGAACGTGGATG 1723 30 CAACCAACGGTAACAGCTACTTTTTACAGT 1789 30 ATAACTGAAGGATAGGAGCTTGTAAAGTCT 1855 30 TAATGCTACATCTCAAAGGATGATCCCAGA 1921 30 AAGTAGTTGATGACCTCTACAATGGTTTAT 1987 30 ACCTAGAAGCATTTGAGCGTATATTGATTG 2053 30 AATTTTGCCCCTTCTTTGCCCCTTGACTAG 2119 30 ACCATTAGCAATCATTTGTGCCCATTGAGT #========================================================================= ######################################## 3. PILERCR prediction from GenBank:DGCC7710 pilercr v1.02 By Robert C. Edgar WTDGCC7710.fasta: 2 putative CRISPR arrays found. DETAIL REPORT Array 1 >gi|134103876|gb|EF434469.1| Streptococcus thermophilus strain DGCC7710 CRISPR1 locus genomic sequence Pos Repeat %id Spacer Left flank Repeat Spacer ========== ====== ====== ====== ========== ==================================== ====== 38 36 100.0 30 TTCATTTGAG .................................... TGTTTGACAGCAAATCAAGATTCGAATTGT 104 36 100.0 30 TTCGAATTGT .................................... AATGACGAGGAGCTATTGGCACAACTTACA 170 36 100.0 30 ACAACTTACA .................................... CGATTTGACAATCTGCTGACCACTGTTATC 236 36 100.0 30 CACTGTTATC .................................... ACACTTGGCAGGCTTATTACTCAACAGCGA 302 36 100.0 30 TCAACAGCGA .................................... CTGTTCCTTGTTCTTTTGTTGTATCTTTTC 368 36 100.0 30 GTATCTTTTC .................................... TTCATTCTTCCGTTTTTGTTTGCGAATCCT 434 36 100.0 30 TGCGAATCCT .................................... GCTGGCGAGGAAACGAACAAGGCCTCAACA 500 36 100.0 30 GGCCTCAACA .................................... CATAGAGTGGAAAACTAGAAACAGATTCAA 566 36 100.0 30 ACAGATTCAA .................................... ATAATGCCGTTGAATTACACGGCAAGGTCA 632 36 100.0 30 GGCAAGGTCA .................................... GAGCGAGCTCGAAATAATCTTAATTACAAG 698 36 100.0 30 TAATTACAAG .................................... GTTCGCTAGCGTCATGTGGTAACGTATTTA 764 36 100.0 30 AACGTATTTA .................................... GGCGTCCCAATCCTGATTAATACTTACTCG 830 36 100.0 30 TACTTACTCG .................................... AACACAGCAAGACAAGAGGATGATGCTATG 896 36 100.0 29 TGATGCTATG .................................... CGACACAAGAACGTATGCAAGAGTTCAAG 961 36 100.0 30 AGAGTTCAAG .................................... ACAATTCTTCATCCGGTAACTGCTCAAGTG 1027 36 100.0 30 TGCTCAAGTG .................................... AATTAAGGGCATAGAAAGGGAGACAACATG 1093 36 100.0 30 AGACAACATG .................................... CGATATTTAAAATCATTTTCATAACTTCAT 1159 36 100.0 30 ATAACTTCAT .................................... GCAGTATCAGCAAGCAAGCTGTTAGTTACT 1225 36 100.0 30 GTTAGTTACT .................................... ATAAACTATGAAATTTTATAATTTTTAAGA 1291 36 100.0 30 ATTTTTAAGA .................................... AATAATTTATGGTATAGCTTAATATCATTG 1357 36 100.0 30 AATATCATTG .................................... TGCATCGAGCACGTTCGAGTTTACCGTTTC 1423 36 100.0 30 TTACCGTTTC .................................... TCTATATCGAGGTCAACTAACAATTATGCT 1489 36 100.0 CAATTATGCT .................................... AATCGTTCAA ========== ====== ====== ====== ========== ==================================== 23 36 29 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC Array 2 >gi|134103876|gb|EF434469.1| Streptococcus thermophilus strain DGCC7710 CRISPR1 locus genomic sequence Pos Repeat %id Spacer Left flank Repeat Spacer ========== ====== ====== ====== ========== ========================================== ====== 1948 42 97.6 24 TACAATGGTT ..T....................................... TAGAAGCATTTGAGCGTATATTGA 2014 42 92.9 24 CGTATATTGA .T......................................AT TTTGCCCCTTCTTTGCCCCTTGAC 2080 42 100.0 GCCCCTTGAC .......................................... ATTAGCAATCATTTGTGCCCATTG ========== ====== ====== ====== ========== ========================================== 3 42 24 TAGGTTTTTGTACTCTCAAGATTTAAGTAACTGTACAACACC SUMMARY BY SIMILARITY Array Sequence Position Length # Copies Repeat Spacer + Consensus ===== ================ ========== ========== ======== ====== ====== = ========= 1 gi|134103876|gb| 38 1487 23 36 29 + ---GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC--- 2 gi|134103876|gb| 1948 174 3 42 24 + TAGGTTTTTGTACTCTCAAGATTTAAGTAACTGTACAACACC ************************************ SUMMARY BY POSITION >gi|134103876|gb|EF434469.1| Streptococcus thermophilus strain DGCC7710 CRISPR1 locus genomic sequence Array Sequence Position Length # Copies Repeat Spacer Distance Consensus ===== ================ ========== ========== ======== ====== ====== ========== ========= 1 gi|134103876|gb| 38 1487 23 36 29 GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC 2 gi|134103876|gb| 1948 174 3 42 24 394 TAGGTTTTTGTACTCTCAAGATTTAAGTAACTGTACAACACC 4. Pectobacterium atrosepticum SCRI1043/ Erwinia carotovora subsp. atroseptica SCRI1043. CRISPR1 ######################################## # Program: Crispr Finder Program # Author: Ibtissem GRISSA # Rundate (GMT): 7/11/2006 12:6:59 # Report_file: CRISPR1 ######################################## #======================================= # # Sequence: NC_004547 # Description: Erwinia carotovora subsp. atroseptica SCRI1043, complete genome # Length: 5064019 # Id: gi|50118965|ref|NC_004547.2| # #========================================================================= # Crispr Rank in the sequence: 1 # Crispr_begin_position: 4124152 Crispr_end_position: 4125860 # DR: TTTCTAAGCTGCCTGTACGGCAGTGAAC DR_length: 28 Number_of_spacers: 27 #========================================================================= Spacer_begin_position Spacer_length Spacer_sequence 4124180 33 GGGTTGCCTCGGCCTGAACAGCGATCTCGCGTG 4124241 32 TTGAAATTAGTGACGTGAGCAGAATCGTAAAC 4124301 32 CGATACTCGGATACGCTCCCAGACCTTATCGA 4124361 32 AACACGCCGCGCGATTTTGTCCACAATGCGCA 4124421 32 TGTCGAGCGCTGATTCGTCGGTCACTGGATAC 4124481 32 TCCATAGTCCTCGGAAGGGACGACGATGTGAC 4124541 32 CATTAATTGACGTCTCTCTATCAATTATCTGT 4124601 32 CCGAGCCTGTATTACAGGTGGTTATGGCGACA 4124661 32 ACGCTTGATATTGCTTATGGCGTGTTAGTTCA 4124721 32 GATCGGCAAAGATAATCAGGCACTCATCACCG 4124781 32 TGAATGAGCCGGCCAATCTATATCACTACACT 4124841 32 GGCGCGTCAGTGAAGTGGATGTATCCGTGCCA 4124901 32 GCGGGGGCTAATGTGTCTGAGGCTGGTAGATC 4124961 32 CCAGACCGCAGCAACTACCAGTTAGCTCAACA 4125021 32 GAGCAATTCGCGCATGAGTTCGCGTCTACGCT 4125081 32 GCGTTAGGCTCGTCGGTGTAGATCAACTTTCC 4125141 32 TCTTTGTAATCACCTGTACGGCTGGCCCACAC 4125201 32 TTACGGACGCAATCTATGTGCCGTGTAACGAT 4125261 32 ACGCTGCGTAAGCTGGCAACCGGTGAGGTGCA 4125321 32 TTTGTCCTTGAGCCTGACCGCTTCCGCCATCG 4125381 32 CTGTCGCAGTATTTGACCCACGACGTGATGCT 4125441 32 AGCTTGCGTTGCTCGTCTGTCAAATTGCGGGT 4125501 32 AGGCGATGGTGTCGTGGTCTGACCCTGCAAAC 4125561 32 AACCGTCGCTCGCTGGCCACTGTACGATTCGC 4125621 32 ACTCTGTTATTCCCCAACTGGCGTATGCCGAG 4125681 32 CCTGACGGCAACGCCGACGCCGATCCGACACA 4125741 32 AATCAATGGCTCAGGGGATTCTACAACCCTAA 4125801 32 GCATCCGTCCAGACCGTATCGATAGTCTCTGC #========================================================================= ######################################## 5. CRISPR2 6. CRISPR3